 needs to be fixed.
Opioid prescription has become a popular topic because of the increasing prescription and concerns about overdose, opioid-related deaths, and other opioid-associated side effects. (Ref: Paulozzi et al. 2011, Rudd et al. 2016, Rudd et al. 2016, Solomon et al. 2010, Dunn, et al. 2010. )
Our motivation is to understand the opioid prescription in the States using publicly available data. Thus, the main questions we are going to ask include:
Among the 50 States and District of Columbia, which five States had the highest and lowest number in
Did the total number of opioid prescriptions change over time?
Can you visualize the opioid prescription rate on the U.S map?
In this case study, we’ll walk you through collecting data, importing data, wrangling data, and visualizing the data, using the well-established and commonly used packages, including kableExtra, readr , tidyverse , datasets , ggplot2 , scales , ggrepel , choroplethr , and choroplethrMaps . Here we provide an overview aobut these packages before we starts:
| R package | What can the package provide? |
|---|---|
kableExtra |
Helps with building complex tables and manipulating table styles; creates awesome HTML tables |
readr |
Allows you to read data from various formats faster |
tidyverse |
Provides user-friendly data manipulation |
datasets |
Gives us useful information, including the States’ information |
ggplot2 |
Improves data visualisations |
scales |
Adds the flexibility of scaling |
ggrepel |
Avoids superimposing the legends you don’t expect |
choroplethr |
Helps preparing for plotting the U.S. Map |
choroplethrMaps |
Helps us plot the U.S. Map |
Recently, the Centers for Medicare & Medicaid Services (CMS) has prepared a public dataset, including information in the part D covering calendar years 2013 through 2016, to make our health care system more transparent, and to allow citizens to understand the medical expenditure and relevant health issues. You can find the description of the data here:
Medicare part D dataset
Medicare Part D is prescription drug coverage plan run by Center for Medicare & Medicaid Services (CMS). Here is a website describing the information about Part D It was originally proposed by President Bill Clinton in 2000 based on earlier proposals developed by Congresswoman Nancy Pelosi and Senator Tom Daschle, and started in January 2006. There are two ways for participants to enroll in Medicare part D:
That means that for those who have already enrolled in Medicare (either >= 65 years old or having certain diseases such as end-stage-renal-disease or other disabilities), they are eligible to join the Part D. (For more information on eligibility over 65 years old, click here and for Medicare eligibility for those under 65, click here). For people enrolled in other insurance plan, they may also be eligible to apply for the plan. (For more information on how Medicare works with other types of insurance, click here).
First, let’s download the opioid prescriber data from the years 2013-2016, which is available from the CMS website as an application programming interface (API).
To do this, you can type “opioid” in the CMS website.. The following is what you will see:
Now you can click on the “Medicare Part D Opioid Prescriber Summary File 2013”, and you will be directed to the next page:
Further, you can click on the “Export”, and choose the “csv”, you will be guided to the download link. This is how we found the api link for downloading the data. The screen shot is shown as below:
if(!file.exists("./data/2016prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/6wg9-kwip/rows.csv"
download.file(file_link,
destfile = "./data/2016prescriber.csv",mode = "wb")
}
if(!file.exists("./data/2015prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/6i2k-7h8p/rows.csv"
download.file(file_link,
destfile = "./data/2015prescriber.csv",mode = "wb")
}
if(!file.exists("./data/2014prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/e4ka-3ncx/rows.csv"
download.file(file_link,
destfile = "./data/2014prescriber.csv",mode = "wb")
}
if(!file.exists("./data/2013prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/yb2j-f3fp/rows.csv"
download.file(file_link,
destfile = "./data/2013prescriber.csv",mode = "wb")
}There is a description about the variables. It can be found in the same page mentioned above. Here is a screenshot showing how you can find the information. For your convenience, we have saved it as a csv file (
prescriberopioiddescription.csv) so that you can understand the meanings of the variables easily. This file describes what data is contained in each column. We use the commands in the kableExtra package to show the file to give us a nice display of the table in the windows.
The files we just downloaded contain the following information:
library(kableExtra)
library(tidyverse)
partdinfodescription = read.csv("./doc2/prescriberopioiddescription.csv")
names(partdinfodescription)[1] <- "Column Name"
partdinfodescription %>%
select(`Column Name`,Description) %>%
kable() %>%
kable_styling() %>%
column_spec(1, bold = T, border_right = T, background = "white") %>%
column_spec(2, width = "30em", background = "lightyellow")| Column Name | Description |
|---|---|
| NPI | National Provider Identifier |
| NPPES Provider Last/Org Name | Last Name/Organization Name of the Provider |
| NPPES Provider First Name | First Name of the Provider |
| NPPES Provider Zip Code | Zip Code of the Provider |
| NPPES Provider State | State Code of the Provider |
| Specialty Description | Provider Specialty Type |
| Total Claim Count | The number of Medicare Part D drug claims, including original prescriptions and refills. |
| Opioid Claim Count | The number of Medicare Part D opioid drug claims, including original prescriptions and refills. For a list of drugs that include opioids, visit <https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_DrugCatList_2014.zip.>; |
| Opioid Prescribing Rate | The number of Opioid Claims divided by the Overall Claims and multiplied by 100. |
At this stage, you’ve downloaded your data from CMS API successfully. Now, what you need is to load or read these data in R. There are several base R functions that allow you read your data into R, which you may be familiar with such as read.table(), read.csv(), and read.delim(). Instead of using these, we will use the functions in the readr R package. The main reasons for this are
readr are around 10x faster.The main functions in readr include:
readr functions |
Description |
|---|---|
read_delim() |
reads in a flat file data with a given character to separate fields |
read_csv() |
reads in a CSV file |
read_tsv() |
reads in a file with values separated by tabs |
read_lines() |
reads only a certain number of lines from the file |
read_file() |
reads a complete file into a string |
write_csv() |
writes data frame to CSV |
A useful cheatsheet for the functions in the readr package can be found on RStudio’s website:
Combining the functions in the packages dplyr and kableExtra can help us take a glance at the data. Here, we use the mutate to generate a new character variable named NPI and use the kable() and kable_styling to make the display clearer.
| NPI | NPPES Provider Last Name | NPPES Provider First Name | NPPES Provider ZIP Code | NPPES Provider State | Specialty Description | Total Claim Count | Opioid Claim Count | Opioid Prescribing Rate | Extended-Release Opioid Claims | Extended-Release Opioid Prescribing Rate |
|---|---|---|---|---|---|---|---|---|---|---|
| 1003000126 | ENKESHAFI | ARDALAN | 21502 | MD | Internal Medicine | 545 | 23 | 4.22 | NA | NA |
| 1003000142 | KHALIL | RASHID | 43623 | OH | Anesthesiology | 1733 | 1004 | 57.93 | 63 | 6.27 |
| 1003000167 | ESCOBAR | JULIO | 89403 | NV | Dentist | 49 | 11 | 22.45 | 0 | 0.00 |
| 1003000282 | BLAKEMORE | ROSIE | 37243 | TN | Nurse Practitioner | 146 | NA | NA | 0 | NA |
| 1003000407 | GIRARDI | DAVID | 15825 | PA | Family Practice | 2225 | 17 | 0.76 | 0 | 0.00 |
| 1003000423 | VELOTTA | JENNIFER | 44106 | OH | Obstetrics & Gynecology | 170 | NA | NA | 0 | NA |
We can see that the data is structured as provider-based level, meaning that the information in each role is for the specific health care provider. In this project, we’re going to do the analysis at State level, so you can probably imagine that we need to have some process to get State-level information from this data. Before moving there, let’s take a look at how big the data is for the 2016 information:
[1] 1131550 11
Wow! There are 1131550 health providers listed in this file. Of note, not every health provider had prescribed opioid in the year 2016.
In many cases, we want to add basic information about States, such as State name, Region, and State abbreviation in our dataset. We can get this information easily using the dataset state in the library datasets. However, before taking advantage of the state, we need to take a look at our data, and have an idea about whether we can adopt the information in state directly.
#### How many States are there in the 2016 prescription dataset?
[1] "MD" "OH" "NV" "TN" "PA" "CO" "FL" "OK" "CA" "TX" "OR" "KY" "SC" "CT" "NY"
[16] "NJ" "NC" "MI" "MA" "MS" "VA" "WA" "NM" "NH" "IA" "MN" "NE" "DC" "GA" "HI"
[31] "LA" "ND" "IN" "AZ" "IL" "AL" "PR" "AR" "KS" "RI" "WI" "ID" "MO" "WV" "VT"
[46] "UT" "ME" "AE" "MT" "DE" "WY" "SD" "AK" "ZZ" "GU" "VI" "MP" "AA" "AP" "XX"
[61] "AS"
There are 61 values for NPPES Provider State. From the manual, we know that other than the fifty States and DC, there are the other 10 values as following:
Values for NPPES Provider State |
Description |
|---|---|
XX |
Unknown |
AA |
Armed Forces Central/South America |
AE |
Armed Forces Europe |
AP |
Armed Forces Pacific |
AS |
American Samoa |
GU |
Guam |
MP |
Northern Mariana Islands |
PR |
Puerto Rico |
VI |
Virgin Islands |
ZZ |
Foreign Country |
state in the datasets [1] "Alabama" "Alaska" "Arizona" "Arkansas"
[5] "California" "Colorado" "Connecticut" "Delaware"
[9] "Florida" "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa" "Kansas"
[17] "Kentucky" "Louisiana" "Maine" "Maryland"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "Nevada"
[29] "New Hampshire" "New Jersey" "New Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"
state.abb <- c(state.abb, "DC",
"XX",
"AA",
"AE",
"AP",
"AS",
"GU",
"MP",
"PR",
"VI",
"ZZ")
state.region <- as.factor(c(as.character(state.region), "South",
rep("Others",10)))
state.name <- c(state.name, "District of Columbia",
"Unknown",
"Armed Forces Central/South America",
"Armed Forces Europe",
"Armed Forces Pacific",
"American Samoa",
"Guam",
"Northern Mariana Islands",
"Puerto Rico",
"Virgin Islands",
"Foreign Country")
states <- unique(prescription_2016$`NPPES Provider State`) %>% data.frame(.)
names(states) <- "NPPES Provider State"
states$state_name <- state.name[match(tolower(states$`NPPES Provider State`), tolower(state.abb))]
states$region <- state.region[match(tolower(states$`NPPES Provider State`), tolower(state.abb))]
states %>%
kable() %>%
kable_styling() | NPPES Provider State | state_name | region |
|---|---|---|
| MD | Maryland | South |
| OH | Ohio | North Central |
| NV | Nevada | West |
| TN | Tennessee | South |
| PA | Pennsylvania | Northeast |
| CO | Colorado | West |
| FL | Florida | South |
| OK | Oklahoma | South |
| CA | California | West |
| TX | Texas | South |
| OR | Oregon | West |
| KY | Kentucky | South |
| SC | South Carolina | South |
| CT | Connecticut | Northeast |
| NY | New York | Northeast |
| NJ | New Jersey | Northeast |
| NC | North Carolina | South |
| MI | Michigan | North Central |
| MA | Massachusetts | Northeast |
| MS | Mississippi | South |
| VA | Virginia | South |
| WA | Washington | West |
| NM | New Mexico | West |
| NH | New Hampshire | Northeast |
| IA | Iowa | North Central |
| MN | Minnesota | North Central |
| NE | Nebraska | North Central |
| DC | District of Columbia | South |
| GA | Georgia | South |
| HI | Hawaii | West |
| LA | Louisiana | South |
| ND | North Dakota | North Central |
| IN | Indiana | North Central |
| AZ | Arizona | West |
| IL | Illinois | North Central |
| AL | Alabama | South |
| PR | Puerto Rico | Others |
| AR | Arkansas | South |
| KS | Kansas | North Central |
| RI | Rhode Island | Northeast |
| WI | Wisconsin | North Central |
| ID | Idaho | West |
| MO | Missouri | North Central |
| WV | West Virginia | South |
| VT | Vermont | Northeast |
| UT | Utah | West |
| ME | Maine | Northeast |
| AE | Armed Forces Europe | Others |
| MT | Montana | West |
| DE | Delaware | South |
| WY | Wyoming | West |
| SD | South Dakota | North Central |
| AK | Alaska | West |
| ZZ | Foreign Country | Others |
| GU | Guam | Others |
| VI | Virgin Islands | Others |
| MP | Northern Mariana Islands | Others |
| AA | Armed Forces Central/South America | Others |
| AP | Armed Forces Pacific | Others |
| XX | Unknown | Others |
| AS | American Samoa | Others |
Since it’ reasonable to have more opioid claims simply due to more participants enrolled in the Part D, we may want to compare the opioid prescription per 100 enrollments instead of total number of opioid prescriptions across States. Thus, one of the questions we are interested in is what’s the distribution of opioid prescription per 100 enrollments by States. To answer our question, other than the opioid claim information we’ve already got, we also need the information about how many participants enrolled in the Part D plan for each State. To the best of our knowledge, we can get the 2015 enrollment information from the publicly available CMS website.
(Note: The information is based on 2015. Please let us know if there are year-by-year State-level Part D enrollment information, so that we can update this analysis.)
population <- read_csv("./doc2/enrollnumber.csv") %>%
rename( state_name = State,
enrollment_num2015 = Total)Now, we can add the number of enrollments in Part D by State from population to the States information dataset states.
Note that there is some difference in the names of States between population and states. They are “Pending State Designation” and “Foreign and Other Outlying Areas”. Thus, in the further analysis, we can consider if we want to add them in or exclude them.
[1] "Alabama" "Alaska"
[3] "Arizona" "Arkansas"
[5] "California" "Colorado"
[7] "Connecticut" "Delaware"
[9] "District of Columbia" "Florida"
[11] "Georgia" "Hawaii"
[13] "Idaho" "Illinois"
[15] "Indiana" "Iowa"
[17] "Kansas" "Kentucky"
[19] "Louisiana" "Maine"
[21] "Maryland" "Massachusetts"
[23] "Michigan" "Minnesota"
[25] "Mississippi" "Missouri"
[27] "Montana" "Nebraska"
[29] "Nevada" "New Hampshire"
[31] "New Jersey" "New Mexico"
[33] "New York" "North Carolina"
[35] "North Dakota" "Ohio"
[37] "Oklahoma" "Oregon"
[39] "Pennsylvania" "Rhode Island"
[41] "South Carolina" "South Dakota"
[43] "Tennessee" "Texas"
[45] "Utah" "Vermont"
[47] "Virginia" "Washington"
[49] "West Virginia" "Wisconsin"
[51] "Wyoming" "American Samoa"
[53] "Guam" "Northern Mariana Islands"
[55] "Puerto Rico" "Virgin Islands"
[57] "Pending State Designation" "Foreign and Other Outlying Areas"
[59] "Total"
Now, we can combine the population information with the state, and save it as a new data frame called populationstate
Let’s see what the populationstate looks like.
| NPPES Provider State | state_name | region | enrollment_num2015 |
|---|---|---|---|
| MD | Maryland | South | 544529 |
| OH | Ohio | North Central | 1634697 |
| NV | Nevada | West | 294927 |
| TN | Tennessee | South | 903638 |
| PA | Pennsylvania | Northeast | 1878311 |
| CO | Colorado | West | 548360 |
[1] 56
In the dataset prescription_2016, we have information about total claim count, opioid claim count, and extended-release opioid claims. For this project, we only need to look at opioid claim count and total claim count.
[1] 0
[1] 319003
We noticed that there are some missing values due in Opioid Claim Count. This tells us that not every provider included in this dataset is eligible to prescribe opioid. Thus, in the next step, when we want to calculate the opioid prescription rate by State, we need to limit our data to those providers with both total claim information and opioid claim information.
Great! Now we have better idea about our dataset, and we need to create a dataset containing the information we need to answer our questions. In other words, we want to create a dataset for each year with following information:
| Variable Name | Description |
|---|---|
| NPPES Provider State | State Abbreviation |
| total_opioid_claim | Total opioid claims (Part D) in the State |
| total_claim_count | Total claims (Part D) in the State |
| state_opioid_prescribing_rate | The proportion of total opioid claims versus total claims (Part D) in the State |
| state_name | State Name |
| region | The region of the State |
| enrollment_num2015 | Number of Part D enrollments in 2015 |
| total_claim_bypop | Total claims per 100 enrollment (*) |
| opioid_claim_bypop | Opioid claims per 100 enrollment (*) |
| year | year of the data came from |
(*) These numbers are valid only for year 2015.
Since we’re going to use the same chuck of codes many times, we can write a function named anadata to save our times!
anadata <- function(data = data, year = year){
c <- year
data %>%
left_join(.,populationstate,by = c("NPPES Provider State")) %>%
select(`NPPES Provider State`,
state_name,region,
`Opioid Claim Count`,
`Total Claim Count`) %>%
filter(is.na(`Total Claim Count`) +
is.na(`Opioid Claim Count`) == 0 ) %>%
group_by(`NPPES Provider State`) %>%
summarise(total_opioid_claim = sum(`Opioid Claim Count`),
total_claim_count = sum(`Total Claim Count`)) %>%
mutate(state_opioid_prescribing_rate =
(total_opioid_claim/total_claim_count)*100) %>%
arrange(desc(state_opioid_prescribing_rate)) %>%
left_join(.,populationstate,by = c("NPPES Provider State")) %>%
mutate(total_claim_bypop =
(total_claim_count/enrollment_num2015)*100,
opioid_claim_bypop =
(total_opioid_claim/enrollment_num2015)*100) %>%
filter(!is.na(state.name)) %>%
mutate(year = c) -> x;
names(x) = tolower(names(x)); # We change the variable names to lowercase.
return(x)
}
prescriber2016byState <- anadata(prescription_2016,2016)
prescriber2015byState <- anadata(prescription_2015,2015)
prescriber2014byState <- anadata(prescription_2014,2014)
prescriber2013byState <- anadata(prescription_2013,2013)Here, we write a loop to execute the repetitive procedures. The alternative to do it is to implement some commands in the package purrr. [ Here is the cheatsheet for using purrr]
Now, we’re ready for our next step - data visualization!
The first task is to understand the opioid prescriptions by States in 2016. We are going to use the functions in the library ggplot2 to help us plot the bar chart.
library(ggplot2)
p <- prescriber2016byState %>%
ggplot(aes(x=reorder(state_name,
-total_opioid_claim),
y=total_opioid_claim)) +
geom_bar(stat="identity")
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims") It’s nice to visualize all the information we have from this dataset. However, based on our original questions, we may want to focus on the 50 States and DC. Since for all the regions other than the 50 States and DC, they were labelled as Others in their region variable, we can limit the analysis based on them using the filter .
p <- prescriber2016byState %>%
filter(region!="Others") %>%
ggplot(aes(x=reorder(state_name,
-total_opioid_claim),
y=total_opioid_claim)) +
geom_bar(stat="identity")
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims") You may wonder why the labels on the y-axis are 6e+06 , 4e+06 , …, which are relatively hard to read. Here comes one potential solution - using the scale_y_continuous(labels = comma) in the library scales. This function will force the display of labels in y-axis be an easy-to-read number.
library(scales)
p + scale_y_continuous(labels = comma) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims") You may feel it’s hard to read since you need to either tilt your head or have a special ability to read the labels of States in the x-axis. Here, we provide one way to rotate this bar chart using coord_flip() .
p + scale_y_continuous(labels = comma) +
coord_flip() +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims") Based on the plot above, California, Florida, Texas, Pennsylvania, and Michigan are the top five States with the highest opioid prescription claims in Part D in 2016. And the top five States with the lowest opioid prescription claims in Part D in 2016 are District of Columbia, Alaska, Wyoming, Vermont, and Hawaii.
p <- prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(1:5,
n()-4,
n()-3,
n()-2,
n()-1,
n() # n() means the number of rows
# Thus, n() is the one with highet total_opioid_claim
)) %>%
mutate(rankgroup = ifelse(row_number()<=5,"Lowest five",
"Highest five")) %>%
ggplot(aes(x=reorder(state_name,
-total_opioid_claim),
y=total_opioid_claim)) +
geom_bar(stat="identity")
p + scale_y_continuous(labels = comma) +
coord_flip() +
facet_wrap(~rankgroup, scales="free") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016",
subtitle = "Top five States with the highest and lowest opioid prescription") +
xlab("States") +
ylab("Number of opioid claims") The plots did show the numbers of total opioid claims right. However, the audience need to read the x-axis carefully. Otherwise, they may misinterpret the results since the scales are so different. In this case, it may be a good idea to show a table with the numbers.
prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(1:5,
n()-4,
n()-3,
n()-2,
n()-1,
n())) %>%
select(state_name,total_opioid_claim) %>%
kable() %>%
kable_styling()| state_name | total_opioid_claim |
|---|---|
| California | 6956583 |
| Florida | 5568389 |
| Texas | 5012297 |
| Pennsylvania | 3525981 |
| Michigan | 3337506 |
| Hawaii | 155670 |
| Vermont | 146706 |
| Wyoming | 104558 |
| Alaska | 82557 |
| District of Columbia | 79879 |
Alternatively, you can show the top 5 by
prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(1:5)) %>%
select(state_name,total_opioid_claim) %>%
kable() %>%
kable_styling()| state_name | total_opioid_claim |
|---|---|
| California | 6956583 |
| Florida | 5568389 |
| Texas | 5012297 |
| Pennsylvania | 3525981 |
| Michigan | 3337506 |
And you can show the lower 5 by
prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(n()-4,
n()-3,
n()-2,
n()-1,
n())) %>%
select(state_name,total_opioid_claim) %>%
kable() %>%
kable_styling()| state_name | total_opioid_claim |
|---|---|
| Hawaii | 155670 |
| Vermont | 146706 |
| Wyoming | 104558 |
| Alaska | 82557 |
| District of Columbia | 79879 |
Probably you’re not surprised by the results, since the population in California, Florida, and Texas is large, and the population in District of Columbia, Alaska, and Wyoming is small.
To make a fair comparison across different States, we may want to adjust for “size of States.” There are many good options to achieve this goal. In this case study, we provide two potential methods:
Create a variable called opioid prescription rate (defined by CMS). Opioid prescription rate is the proportion of opioid claims among the total claims in a certain year. Since total claims can be a proxy of the number of enrollments in Part D, opioid prescription rate can be considered as a fair comparison across States.
Create a variable called opioid prescription claims per 100 enrollments . Since the enrollment number by States was available in 2015, we can create this variable to directly adjust for the population size. (Please let us know if there’s any publicly available information about Part D enrollment by States in the other years.)
To visualize our hypothesis, we will have two plots:
(1) The plot showing the number of enrollments versus number of opioid claims
library(ggrepel)
p <- prescriber2015byState %>%
filter(region != "Others") %>%
ggplot(aes(x = enrollment_num2015,
y = total_opioid_claim,
color = region)) +
geom_point() +
geom_smooth(method = "lm", col = "blue")
p + scale_y_continuous(labels = comma) +
scale_x_continuous(labels = comma) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Number of enrollments vs. Opioid prescription claims in Medicare Part D\ by State in 2015",
subtitle = "Number of enrollments is associated with number of opioid prescriptions") +
xlab(" Number of enrollments ") +
ylab(" Number of opioid claims ") +
geom_text_repel(aes(label=`nppes provider state`)) p <- prescriber2015byState %>%
filter(region != "Others") %>%
ggplot(aes(x = total_claim_count,
y = total_opioid_claim,
color = region)) +
geom_point() +
geom_smooth(method = "lm", col = "blue")
p + scale_y_continuous(labels = comma) +
scale_x_continuous(labels = comma) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Total claims vs. Opioid prescription claims in Medicare Part D\ by State in 2015",
subtitle = "Number of total claims is associated with number of opioid prescriptions") +
xlab(" Number of total claims ") +
ylab(" Number of opioid claims ") +
geom_text_repel(aes(label=`nppes provider state`)) p <- prescriber2016byState %>%
filter(region != "Others") %>%
ggplot(aes(x=reorder(state_name,
-state_opioid_prescribing_rate),
y=state_opioid_prescribing_rate)) +
geom_bar(stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Opioid Prescription Rate (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") From this plot, we can see that in Part D in 2016, the opioid prescription rates in Nevada, Utah, Alabama, Oklahoma, Idaho are the highest, and the opioid prescription rates in New York, Rhode Island, Hawaii, Massachusetts, and North Dakota are the lowest.
We may want to see the trend of opioid prescription rate during 2013 and 2016 by State.
Here, we append the data from different years in long format first, and then plot the trend of opioid prescription rate.
p <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
ggplot(aes(x=reorder(state_name,
-state_opioid_prescribing_rate),
y=state_opioid_prescribing_rate,
fill = as.factor(year))) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Trend of opioid Prescription Rate (in Medicare Part D) by State during 2013 - 2016") +
xlab("States") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") +
labs(fill = "Year") ### Visualize/Animate the change by States using bar chart
The alternative approach is to animate ther results using gganimate.
# <Ref 1> https://gganimate.com/articles/gganimate.html
# <Ref 2> https://towardsdatascience.com/create-animated-bar-charts-using-r-31d09e5841da
library(gganimate)
library(gifski)
data4animation <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
group_by(year) %>%
mutate(rank = rank(-state_opioid_prescribing_rate),
value_rel = state_opioid_prescribing_rate/state_opioid_prescribing_rate[rank==1],
value_std = round(state_opioid_prescribing_rate,2)) %>%
group_by(state_name) %>%
filter(rank <= 20) %>% # choose the top 20
ungroup()
staticplot = ggplot(data4animation,
aes(rank,
group = state_name,
fill = as.factor(state_name),
color = as.factor(state_name))) +
geom_tile(aes(y = round(state_opioid_prescribing_rate/2,2),
height = round(state_opioid_prescribing_rate,2),
width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(state_name, " ")), vjust = 0.2, hjust = 1) +
geom_text(aes(y= value_std,
label = value_std, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.grid.major.x = element_line( size=.1, color="grey" ),
panel.grid.minor.x = element_line( size=.1, color="grey" ),
plot.title=element_text(size=10, hjust=0.5, face="bold", colour="grey", vjust=-1),
plot.subtitle=element_text(size=6, hjust=0.5, face="italic", color="grey"),
plot.caption =element_text(size=6, hjust=0.5, face="italic", color="grey"),
plot.background=element_blank() ,
plot.margin = margin(2,2, 1, 3, "cm"))
staticplotanimation::ani.options(ani.width= 1000, ani.height=1000, ani.res = 1000)
anim = staticplot + transition_states(year,
transition_length = 3,
state_length = 2) +
view_follow(fixed_x = TRUE) +
labs(title = "Trend of opioid Prescription Rate (in Medicare Part D) by State {closest_state}",
subtitle = "Opioid Prescription Rate (Number of opioid claims per 100 claims)",
caption = "Data Source: CMS Data"
)
anim # This is the line to save the animation into gif format
# animate(anim, 200, fps = 20, width = 1200, height = 1000,
# renderer = gifski_renderer("./img/gganim.gif"))In general, the trend the opioid prescription rate was decreasing during 2013 - 2016 across most of the States.
fouryeardata <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
group_by(state_name) %>%
summarise(max_opioid_claim = max(total_opioid_claim),
min_opioid_claim = min(total_opioid_claim)) %>%
mutate(variation = (max_opioid_claim - min_opioid_claim) / max_opioid_claim )
p <- fouryeardata %>%
ggplot(aes(x=reorder(state_name,-variation),
y=variation)) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Variation of total opioid claims (in Medicare Part D) by State during 2013 - 2016",
subtitle = "Variation = (maximun opioid claims - minimum opioid claims) / maximum opioid claims") +
xlab("States") +
ylab("Variation of total opioid claims") There are many ways to display the longitudinal trends of opioid prescription rate. In the plot shown above, the longitudinal change was shown clearly by States. However, it has some disadvantages:
(1) It is difficult for us to compare the trend from State to State directly.
(2) It is hard for us to explore the potential reasons to explain the difference between different States.
Now, suppose we want to visually explore whether the change in opioid prescription rate over time was associate with region, we may want to have a plot showing the changes over time by States, which also incorporated the region information. Based on what we have established, we need to organize our data in long format and add the group information in our ggplot.
datamodel <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
mutate(interval = year - 2013)
p <- datamodel %>%
ggplot(aes(x = year,
y = state_opioid_prescribing_rate,
group = state_name,
color = region)) +
geom_line()
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 0.5)) +
ggtitle("Trend of opioid Prescription Rate (in Medicare Part D) by State during 2013 - 2016") +
xlab("Calender Year") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") +
labs(color = "Region") Also, you may want to add a “summarized” line for each region. In other words, the question is whether we can visually display the summarized longitudinal trend in each region. Here, we provide a way that can achieve this goal in a few seconds. The idea is to take the average of opioid prescription rate in each region at year 2013, 2014, 2015, and 2016, and further then plot the trend of the region-specific average opioid prescription rate over time.
p <- datamodel %>%
ggplot(aes(x = year,
y = state_opioid_prescribing_rate,
group = state_name,
color = region)) +
stat_summary(aes(group = region),
geom = "line",
fun.y = mean, size = 5) +
geom_line()
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 0.5)) +
ggtitle("Trend of opioid Prescription Rate (in Medicare Part D) by State during 2013 - 2016") +
xlab("Calender Year") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") +
labs(color = "Region") Since there are many policies made at State level, and it’s relatively difficult to get information about the policy in militaries, foreign countries, and unknown areas, we may want to focus on the areas we are able to get the information about the number of Part D Part enrollment.
p <- prescriber2015byState %>%
filter(region != "Others") %>% # This line removes the region/States without population information.
ggplot(aes(x=reorder(state_name,
-state_opioid_prescribing_rate),
y=state_opioid_prescribing_rate)) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Opioid prescrion rate (in Medicare Part D) by State in 2015",
subtitle = "Opioid prescription rate = ( # of opioid claims / # of total claims ) * 100 %") +
xlab("States") +
ylab("Opioid prescription rate (%)") We may want to see opioid prescription per 100 enrollment in 2015.
p <- prescriber2015byState %>%
filter(region != "Others") %>%
ggplot(aes(x=reorder(state_name,
-opioid_claim_bypop),
y=opioid_claim_bypop)) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Opioid prescription per 100 enrollments in Medicare Part D\n by State in 2015") +
xlab("States") +
ylab("Number of opioid prescription claims per 100 enrollments") The opioid prescription rate was high in American Samoa, but the opioid prescription per 100 enrollments is low in American Samoa. This may be due to there are relatively few total claims in American Samoa, but the proportion of opioid prescription is high. The opioid prescription rates in Utah, Nevada, Oklahoma, Alabama, Idaho, Colorado, Tennessee, and Oregon are also relatively high. The opioid prescription per 100 enrollments are relatively high in Tennessee, Alabama, and Oklahoma.
To visualize the information from these two methods, we propose to compare the “rankings” from two methods. First, we will create one variable called rank_method1, which is the rank based on the method I (state opioid prescription rate), and the other variable called rank_method2 , which is the rank based on the method II (opioid claim by number of enrollment). This can be done by using the dense_rank(). Then, we can have an x-y plot showing the ranking between these two methods.
## create a new dataframe called "method_comparison"
method_comparison <- prescriber2015byState %>%
filter(complete.cases(.)) %>%
filter(region != "Others") %>%
mutate(rank_method1 = dense_rank(desc(state_opioid_prescribing_rate)),
rank_method2 = dense_rank(desc(opioid_claim_bypop))) %>%
arrange(-opioid_claim_bypop)
## Take a look at the "method_comparison"
method_comparison %>%
head(.) %>%
kable() %>%
kable_styling()| nppes provider state | total_opioid_claim | total_claim_count | state_opioid_prescribing_rate | state_name | region | enrollment_num2015 | total_claim_bypop | opioid_claim_bypop | year | rank_method1 | rank_method2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TN | 2923689 | 36817973 | 7.940929 | Tennessee | South | 903638 | 4074.416 | 323.5465 | 2015 | 7 | 1 |
| AL | 2179506 | 26521232 | 8.217967 | Alabama | South | 681722 | 3890.329 | 319.7060 | 2015 | 4 | 2 |
| OK | 1286402 | 15645299 | 8.222291 | Oklahoma | South | 442217 | 3537.923 | 290.8984 | 2015 | 3 | 3 |
| AR | 1132475 | 16230105 | 6.977620 | Arkansas | South | 403326 | 4024.066 | 280.7840 | 2015 | 17 | 4 |
| MS | 1083075 | 16386685 | 6.609482 | Mississippi | South | 392353 | 4176.516 | 276.0461 | 2015 | 24 | 5 |
| KY | 1710346 | 27234066 | 6.280171 | Kentucky | South | 630711 | 4317.994 | 271.1774 | 2015 | 31 | 6 |
## Plot the comparison
p <- method_comparison %>%
ggplot(aes(x=rank_method1,
y = rank_method2,
color = region)) +
geom_point() +
geom_abline(slope=1, intercept=0)
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Comparison of two methods ranking opioid prescription in Medicare Part D \n by State in 2015",
subtitle = "Lower number in rank means higher opioid prescription rate.") +
xlab("Rank by Method I (opioid prescription rate)") +
ylab("Rank by Method II (opioid prescription by population)") +
geom_text_repel(aes(label=state_name)) You can tell that these two methods are consistent with some exceptions. One potential reason is that the population composition in each State is different, making the distributions of total prescriptions different. Take State A, with more older residents (>= 80 y/o), and State B, with fewer older residents (>= 80 y/o). Then, it’s understandable that the average total prescriptions per person in State A may be higher than the average total prescriptions per person in State B. There are pros and cons of these two different methods, and which should we use is highly dependent on the questions you want to address.
You may want to also add the information about number of enrollments in the plot. Here, we provide one potential way to do it - making the size of the dot proportional to the number of enrollment. This can be achieved by geom_point(aes(size = THEVARIABLEYOUWANTTOPUT )).
p <- method_comparison %>%
ggplot(aes(x=rank_method1,
y = rank_method2,
color = region)) +
geom_point(aes(size = enrollment_num2015)) +
geom_abline(slope=1, intercept=0)
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Comparison of two methods ranking opioid prescription in Medicare Part D by State in 2015",
subtitle = "Lower number in rank means higher opioid prescription rate. \nSize of the point is proportional to the number of enrollment.") +
xlab("Rank by Method I (opioid prescription rate)") +
ylab("Rank by Method II (opioid prescription by population)") +
scale_size_continuous(labels = comma) +
geom_text_repel(aes(label=state_name)) In many cases, you may want to integrate your information on a map so that don’t need to map the information on the bar chart to the U.S. map in your brain. Here, we will use the choroplethr and choroplethrMaps to help us achieve this goal. There is a nice website introducing these packages. The other common approach is to use ggmap, which may ask you to apply for a Google API key in advance.
We have done this earlier. This means that the dataset state has been edited by us. To use the choroplethrMaps, we need the original state dataset, so we call the library again, and ask for the original state dataset.
# http://www.bargava.com/Intro-to-Choropleth-using-R/
# install.packages("choroplethr")
# install.packages("choroplethrMaps")
library(datasets)
data(state)
unique(state.name) # There are only 50 States [1] "Alabama" "Alaska" "Arizona" "Arkansas"
[5] "California" "Colorado" "Connecticut" "Delaware"
[9] "Florida" "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa" "Kansas"
[17] "Kentucky" "Louisiana" "Maine" "Maryland"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "Nevada"
[29] "New Hampshire" "New Jersey" "New Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"
The function we need for plot the State level map is state_choropleth, which needs the information about (1) States’ location information, and (2) values, as input. The df_pop_state contained information about (1) States’ location information, so all we need to do is to add the information about (2) values, which is the opioid Prescription per 100 enrollments by States.
library(choroplethr)
library(choroplethrMaps)
x <- prescriber2015byState %>%
filter(!is.na(enrollment_num2015)) %>%
mutate(region = tolower(state_name),
value = opioid_claim_bypop) %>%
filter(!(region %in% c("american samoa",
"guam",
"northern mariana islands",
"virgin islands",
"puerto rico") ) ) %>%
select(region,value)
data("df_pop_state")
y <- df_pop_state %>%
rename(valueold = value) %>%
left_join(.,x, by = c("region")) %>%
select(region,value)
p <- state_choropleth(y,
legend = "Opioid Prescrition Claims \n opioid per 100 enrollments",
num_colors = 5)
p + ggtitle("Opioid Prescription per 100 enrollments (in Medicare Part D) by State in 2015") +
theme(plot.title = element_text(hjust = 0.5 )) What we need to do is to replace the information about (2) values, with the opioid prescription rates by States.
x <- prescriber2015byState %>%
filter(!is.na(enrollment_num2015)) %>%
mutate(region = tolower(state_name),
value = state_opioid_prescribing_rate) %>%
filter(!(region %in% c("american samoa",
"guam",
"northern mariana islands",
"virgin islands",
"puerto rico") ) ) %>%
select(region,value)
y <- df_pop_state %>%
rename(valueold = value) %>%
left_join(.,x, by = c("region")) %>%
select(region,value)
p <- state_choropleth(y,
legend = "Opioid Prescrition Rate \n opioid claims per 100 total claims",
num_colors = 5)
p + ggtitle("Opioid Prescription Rate (in Medicare Part D) by State in 2015") +
theme(plot.title = element_text(hjust = 0.5 )) California, Florida, Texas, Pennsylvania, and Michigan are the top five States with the highest opioid prescription claims in Part D in 2016. However, after taking the population of the States into account, they were not the top five States with the highest opioid prescription rates. Instead, Nevada, Utah, Alabama, Oklahoma, Idaho had the highest opioid prescription claims per 100 enrollments. In general, the trend the opioid prescription rate was decreasing during 2013 - 2016 across most of the States. However, the data for this analysis came from Medicare Part D, and thus opioid prescription information for relatively younger adults was missing in the analysis. Combining these data with opioid-related policies across States may be more informative.
There are at least two extentions you can make from this case study.
The first one is linking this with more publicly available data. The second one is to demonstrate longitudinal data analysis.
Medicare Part D beneficiaries information is publicly available, and may provide additional information you are interested in. The information about Medicare Part D beneficiaries are publicly available. Here, we demonstrate how to download file from the website directly using the method as the aforementioned. However, this is only for demonstration purpose, and not going to be used in this project.
if(!file.exists("./data/2016partDprescriberinfo.csv")){
file_link <- "https://data.cms.gov/api/views/yvpj-pmj2/rows.csv"
download.file(file_link,
destfile = "./data/2016partDprescriberinfo.csv",mode = "wb")
}
if(!file.exists("./data/2015partDprescriberinfo.csv")){
file_link <- "https://data.cms.gov/api/views/3z4d-vmhm/rows.csv"
download.file(file_link,
destfile = "./data/2015partDprescriberinfo.csv",mode = "wb")
}
if(!file.exists("./data/2014partDprescriberinfo.csv")){
file_link <- "https://data.cms.gov/api/views/465c-49pb/rows.csv"
download.file(file_link,
destfile = "./data/2014partDprescriberinfo.csv",mode = "wb")
}
if(!file.exists("./data/2013partDprescriberinfo.csv")){
file_link <- "https://data.cms.gov/api/views/4uvc-gbfz/rows.csv"
download.file(file_link,
destfile = "./data/2013partDprescriberinfo.csv",mode = "wb")
}The following is the code book. In other words, the description of the variables in the dataset we just downloaded is summarized in the following table: (Of note, NPI stands for National Provider Identifier, and NPPES stands for CMS National Plan and Provider Enumeration System. In general, healthcare providers have their unique 10-digit NPIs to identify themselves, and more information can be found in the NPI Registry Public Search)
partdinfodescription = read.csv("./doc2/partdinfodescription.csv")
names(partdinfodescription)[1] <- "Column Name"
partdinfodescription %>%
select(`Column Name`,Description) %>%
kable() %>%
kable_styling() %>%
column_spec(1, bold = T, border_right = T, background = "white") %>%
column_spec(2, width = "30em", background = "lightyellow")| Column Name | Description |
|---|---|
| npi | National Provider Identifier |
| nppes_provider_last_org_name | Last Name/Organization Name of the Provider |
| nppes_provider_first_name | First Name of the Provider |
| nppes_provider_city | City of the Provider |
| nppes_provider_state | State Code of the Provider |
| specialty_description | Provider Specialty Type |
| description_flag | Source of Provider Specialty |
| drug_name | Brand Name |
| generic_name | USAN Generic Name - Short Version |
| bene_count | Number of Medicare Beneficiaries |
| total_claim_count | Number of Medicare Part D Claims, Including Refills |
| total_30_day_fill_count | Number of Standardized 30-Day Fills, Including Refills |
| total_day_supply | Number of Day’s Supply for All Claims |
| total_drug_cost | Aggregate Cost Paid for All Claims |
| bene_count_ge65 | Number of Medicare Beneficiaries Age 65+ |
| bene_count_ge65_suppress_flag | Reason for Suppression of Bene_Count_Ge65 |
| total_claim_count_ge65 | Number of Claims, Including Refills, for Beneficiaries Age 65+ |
| ge65_suppress_flag | Reason for Suppression of Total_Claim_Count_Ge65, Total_30_Day_Fill_Count_Ge65, Total_Day_Supply_Ge65, and Total_Drug_Cost_Ge65 |
| total_30_day_fill_count_ge65 | Number of Standardized 30-Day Fills, Including Refills, for Beneficiaries Age 65+ |
| total_day_supply_ge65 | Number of Day’s Supply for All Claims for Beneficiaries Age 65+ |
| total_drug_cost_ge65 | Aggregate Cost Paid for All Claims for Beneficiaries Age 65+ |
The above plots showing the longitudinal trend may inspire you the following questions:
Is the region associated with the opioid prescription rate at the baseline (aka, at year 2013)?
Is the region associated with the rate of decline in opioid prescription rate across time during 2013 - 2016?
These are all great questions, but we are not going to discuss about this deeply in this case study. If you are interested in these questions, we would recommend you to read some materials about longitudinal data analysis. For your convenience, we do provide some sample codes. We will run two linear mixed effect models using the package lme4, and provide brief interpretation. In the first model fit1, we are investigating whether the region is associated with the opioid prescription rate in the year 2013 while we don’t allow the change of opioid prescription rate to be differed by region. In the second model fit2, we are investigating whether the region is associated with the opioid prescription rate in the year 2013 and whether region associated with the rate of decline in opioid prescription rate across time during 2013 - 2016 at the same time while we allow the change of opioid prescription rate to be differed by regions.
( Note: The instructors still need to add the interpretation of random intercept and random slope.)
library(lme4)
fit1 <- lmer(state_opioid_prescribing_rate ~ as.factor(region) + interval +
(1 | state_name),
data = datamodel)
summary(fit1)Linear mixed model fit by REML ['lmerMod']
Formula: state_opioid_prescribing_rate ~ as.factor(region) + interval +
(1 | state_name)
Data: datamodel
REML criterion at convergence: 69.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.4733 -0.5243 0.0713 0.6071 2.5069
Random effects:
Groups Name Variance Std.Dev.
state_name (Intercept) 0.93704 0.9680
Residual 0.02214 0.1488
Number of obs: 204, groups: state_name, 51
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.354847 0.280613 22.646
as.factor(region)Northeast -0.835871 0.428111 -1.952
as.factor(region)South 0.773766 0.366052 2.114
as.factor(region)West 1.269896 0.388657 3.267
interval -0.169917 0.009318 -18.236
Correlation of Fixed Effects:
(Intr) as.()N as.()S as.()W
as.fctr(r)N -0.654
as.fctr(r)S -0.765 0.501
as.fctr(r)W -0.720 0.472 0.552
interval -0.050 0.000 0.000 0.000
fit2 <- lmer(state_opioid_prescribing_rate ~ as.factor(region)*interval + interval +
(1 + interval | state_name),
data = datamodel)
summary(fit2)Linear mixed model fit by REML ['lmerMod']
Formula: state_opioid_prescribing_rate ~ as.factor(region) * interval +
interval + (1 + interval | state_name)
Data: datamodel
REML criterion at convergence: 32
Scaled residuals:
Min 1Q Median 3Q Max
-1.96922 -0.57516 -0.06762 0.56844 1.95501
Random effects:
Groups Name Variance Std.Dev. Corr
state_name (Intercept) 1.038002 1.01882
interval 0.006639 0.08148 -0.45
Residual 0.010294 0.10146
Number of obs: 204, groups: state_name, 51
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.3729408 0.2951282 21.594
as.factor(region)Northeast -0.7965514 0.4508158 -1.767
as.factor(region)South 0.7735943 0.3854653 2.007
as.factor(region)West 1.1719152 0.4092692 2.863
interval -0.1819800 0.0269231 -6.759
as.factor(region)Northeast:interval -0.0262130 0.0411257 -0.637
as.factor(region)South:interval 0.0001146 0.0351641 0.003
as.factor(region)West:interval 0.0653206 0.0373356 1.750
Correlation of Fixed Effects:
(Intr) as.()N as.()S as.()W intrvl a.()N: a.()S:
as.fctr(r)N -0.655
as.fctr(r)S -0.766 0.501
as.fctr(r)W -0.721 0.472 0.552
interval -0.428 0.280 0.327 0.308
as.fctr()N: 0.280 -0.428 -0.214 -0.202 -0.655
as.fctr()S: 0.327 -0.214 -0.428 -0.236 -0.766 0.501
as.fctr()W: 0.308 -0.202 -0.236 -0.428 -0.721 0.472 0.552
From these models, we can conclude that the opioid prescription rate was decreasing over time during 2013 - 2016. Though there are some difference in the opioid prescription rate across different regions at the baseline (in the year 2013), there is no obvious difference in the trend of decline across different regions.